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ABSTRACT - Delivering high-fidelity audio over wireless channels is a challenging task because the wireless channel presents not 
onry erasure errors, but nlso random bit errors. These errors have severe adverse effect cm decompressing the received audio bitstream 
and may crash the decoder completely. To solve this problem, we propose error resilient scalable audio coding (ERSAC) for mobite 
applications by performing data partition and reversible variable length coding in the scalable audio bitstrewn. Suiiiilation results show 
that ERSAC has very effective error-resilience in addition to bitstream scalability. 

L INTRODUCTION , f . , . . . 

With the advent of the Internet age, streaming high-fideKty audio has become a reality. It is thus natural to extend audio 
streaming to wireless coinxnumcatians so that mobile users can listen to music from handheld devices. However, delivering 
audio over wireless channels is a very challenging task because the wireless channel presents not only erasure errors caused 
by large-scale path loss and fading, but also random bit errors due to the wireless connection. These hit errors have severe 
adverse effect on decorrniressing the received bitstream. If not handled properly, they will crash the decoder. To combat 
these errors, forward error correction (FEC) has been used Cor protecting the compressed date [1] [2] [3] [4]. However, no 
matter how carefully the data are protected before hansrmssion, the received data may a till have bit errors, therefore, 
studying error resilience in audio coding is necessary for audio over wireless channels to overcome the random bit errors. 
While there has been work done on error resilient video coding [5) [6J, to the best of oui knowledge, there is no report on 
error resilient audio coding in the literature. Error resilient schemes fox video coding cannot be directly ported to audio 
coding because the characteristics of audio and video are different There exists strong correlation between adjacent video 
frames that can be exploited to recover the data corrupted in the transmission [5] [6]. In contrast, there is almost no 
correlation between adjacent audio frames in the time domain. Moreover, audio coding artifacts caused by corrupted frames 
are very annoying to hurnan ears. In this work, we propose emir resilient scalable audio coding (ERSAC),* in which we 
employ data partition and reversible variable length codes (RVLC) for scalable audio ceding. Data partition is applied to 
limit the error propagation between different data segments; while RVLC are used to locate errors and rnmhnize error 
propagation. 

2. SCALABLE AUDIO CODING u 
Scalable audio coding has become increasingly popular recently [4] [7] (8] because it can efficiently accommodate the 

bandwidth fluctuation. A scalable audio bitstream typically consists of abase layer phis a number of enhancement layers. It 
is possible to use only a subset of the layers to decode the audio with lower sampling resolutions and/or quality. In 
streaming applications, the layers in a scalable bitstream are selectively delivered to adapt to network bandwidth fluctuation 
and packet loss level For example, when the available bandwidth is low or packet loss ratio is high, only the base layer is 
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In our original scalable audio codec, the audio signal is first split into individual time segments, which are filtered by a 
polyphase quadrature filter and grouped into four subbands to facilitate scalability in sampling resolution. The modified 
DCT (MDCT) is then performed on each aubband and the resulting MDCT coefficients weighted by a psychoacoustic mask 
functiorL Finally, each weighted subband is encoded into an ernbedded bitstream using bit-plane coding, where each bit 
plane is coded mto one layer or data unit (DTJ). 

3. ERROR RESILIENT SCALABLE AUDIO CODING 
3.1 Data Partition 

Each DUm the original aumobitefrearocorisistsofri See Fig. 1, where 

necessary dummy zeros are added tor the byte-alignment The sign bits and refinement bits are not entropy coded, hence bit 
errors among for^ win not propagate. In contrast, the significance bits are compressed with variable length codes (VLQ. 
When an error occurs in this portion of the bitstream, it will propagate and the whole DU will be damaged, including the 
sign data as well as the refinement data. DU multiplexing makes this situation more complex because when the decoder 
detects an error, it does not know the exact error location. As a result, the whole DU has to be discarded, no matter where 
the error occurs. 
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Figure 1 . Bifctream syntax in the original scalable audio coder. 

In our proposed ERSAC scheme, we first deintedeave file significance data, the sign data, and the refinement data and 
put them in three independent partitions. Thus, any error in the DU can be isolated and restricted to a particular partition. 
To locate errors among different partitions, the decoder must know the partition boundaries However, this is impossible 
without a priori side information. So we put the refinement bib before other bits. This way the decoder can deduce the size 
of the refinement data from the DUs in previous layers. This resolves the ambiguity about the refinernent partition. To 
finish the job, we use significance/sign boundary mark (SBM) to distinguish the significance partition from the sign 
partition, Because the VLC used in our scheme have a finite code tree, we can pick the SBM as an invalid codeword. In 
addition, for error robustness reasons, we choose the SBM to be sufficiently far in Hamming distance from other codewords 
so that it can be detected even if it is corrupted. The new data structure in ERSAC is depicted in Fig* 2. The length of the 
SBM is two or three bytes, which is die only overhead caused by ERSAC It is very small and can be ignored, compared to 
the length of the DUs. 
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Figure 2. Data structure in our proposed ERSAC scheme. 

3.2 Reversible Variable Length Codes a 

Reversible variable length codes (RVLC) are special YLC that can be decoded mstantaneottsry both in the forward and 
backward directions [1 Q. When bit errors occur, the decoder can locate them by comparing the decoded results in the two 
different directions. O wing to this, RVLC have received significant attention recently [12] and been applied widely to error 
resilient video coding [13]. Generally, the two-way decoding property of RVLC will reduce the coding efficiency. 
However, there arc some special types of RVLC that allow two-way decoding whfle retaining the efficiency of traditional 
(nonreversible) VLC. Reversible exponential Golomb (Bxp-Golomb) codes [10] belong to this category. They were 
originally proposed in [12] as an extension of the Exp-Golomb codes PJ. Their length distribution is identical to the Exp- 
Golomb codes. Therefore, they can increase the robustness to channel errors while suffering no loss in coding efficiency. 

Like Golomb codes, Exp-Golomb codes are associated with an order that is small for coding low entropy sources and 
Large for coding high entropy sources. For binary sources, the optimal order can be calculated by the probability of zero. 
Depending 00 the order, each codeword consists of a variable-length prefix and a fix-length suffix. Exp-Golomb Codes is 
not sensitive to the value of the order and the range of the order is very limited Hence, it is easy to choose a suitable order. 
In our experiments, the order is determined by the property of the significance bits in bit-plane coding. We set it to one in 
the first two bit-planes and two in other bit-planes. 

Reversible Exp-Golomb codes are applied to the significance bits in our ERSAC scheme. Note that the codewords have 
a finite code tree. Some nodes on the code tree are thus invalid and can be serves as "traps'* to detect errors. Once the 
decoder encounters an invalid codeword, it knows mat errors exist in me bitstream, although it does not know the exact 
locations. Normally the received significance data are decoded both in the forward and backward directions. In case of an 
error, the decoder will locate it from either the forward or the backward decoding pass. 
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Because errors fo the aign and refinement bite are non-propagating, our mam task is .*>. ^OTW J« 
significance bits coded by reversible EaqKHHomb codes. Due to data partition and the SBM, foe boundary of foe 
s Seance data can be known in £ RVLC can then be used to track and locate the errors Normally tne 
djanficauce data is decoded both in the forward and backward directions- When an error (eg., an invalid codeword) is 
detocteTthe reversible Exp-Golomb decoder will stop and locate it in either decoding direction. Furthennore, one can 
apply sanity checks on tho decoded significance hits because the number of the significance bus is known before eroding 
aid^ uZixt ofbinary ones in them must be identical to the number of sign bits. If no errors are detected mhoth^c 
forward and backward decoding directions and the decoded data pass the sanity check, the decoding result * declared to be 
correct If error happens in decoding, the results of two decoders will he compared and identical portions in tire two 
decoded versions areVonsidered to be correct By this means, most potentially correct hits can be ubhzed m the subsequent 
source decoding stage. 

4. SIMULATION RESULTS , u Tk. mpt?rj 

Extensive simulations are carried out to test the performance of our proposed ERSAC scheme. The MPE&-4 ^tandard 
audio clips hom23 2, trpt2l 2 arid viool 0 2 are used. The scalable audio coder encodes each audio chp at a fixed rate of 
64 kbplTwo sirrnitoed wirdess network conditions are picked: one using foe Gilbert model with different BERa and foe 
other one having different BERs with Raykigh fading- Note foat such network conditions are obtained after LI, L2 and L3 
of wireless network and aw very typical in mobile applications. We assume that no channel coding is applied to foe scalable 
biisneam and that tho header information is well protected. We illustrate the quality of the decoded audio through thcM.se- 
mask-ratio (NMR), where a lower value of the NMR shows a better quality decoded audio. In the simulations, our ERaAC 
scheme is compared to foe original scheme — referred to as SAC for simplicity. 

In our first set of experiments, we simulate a wireless network environment using the Gilbert model with channel fading 
length of 4 bits at the link layer. The BERs at foe application layer are 10e-3 and 10e-4, respectively. TheNMRs of SAC 
and ERSAC under these BERs are shown in Fig. 3. There are some bursty peaks on foe curves for SAC, snowing the 
occurrence of bit errors, hut none on foe curves for ERSAC. Numerical NMR results, from streaming three audio cups 
(honrn23_2, trpt2land viool0_2) are shown in Table 1. These results are the average values over 10 runs because of the 

twirf mnnm j)f tl W «"""t«l» d fhafrticl 

Table 1. Average NMR results for multiple audio clips with different 'BERs 



Audio 
Clips 


Hnrn23 2 


TTPT21 2 


Vioo 


10 2 


BER-lQe-3 


BER=l0e-4 


BER=10e-3 


BKR-10e-4 


BER - 10e-3 


BER=10e-4 


SAC 


4.7804 


2.8476 


4.6082 


2.9369 


4.770S 


2.9483 


ERSAC 


4.0961 


2.5872 


4.3261 


2,6055 


4.2713 


2.1820 





Figure 3. NMRs of SAC and BRSAC with BER being 10e-3 (left) and 10e-4 (right). The audio clip is horn23_2 
and the bandwidth is 64 kbps. 

In our second set of experiments, we simulate a wireless network environment with Raylcigh lading. The parameters are: 
Eh/No-22dB tor BER=le-3 and Eb/No=32dB for BER-lc-4. The Doppler spread is S 2.7Hz, which is computed from 
fd ~ yfc lc > with the walking velocity v=* km/h(or 0.83m/s), the carrier frequency f c =1900MHZ. and light velocity 
c«-3 X 10 8 m/s. The NMRs of SAC and ERSAC under BERs of 10e-3 and 10e-4 are shown in Fig. 4. Again, there are some 



3 



PAGE 11/12 1 RCVD AT 8/1 1/2005 6:18:58 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF«6/26 * DNIS:2738300 * CSID:509 323 8979 1 DURATION (mnws):05-20 

BEST AVAILABLE COPY 



AUG 11 2005 15:25 FR LEE - HAYES PLL 509 323 8979 TO 15712738300 P. 12/12 



burst peaks on the SAC curves, but almost none no the ERSAC curves. Average NMR results from streaming honm23_2, 
trpGland vioo 10J2 under constant network bandwidth (64kbps) and different BERs are shown in Table 2. 





Figure 4. NMRfl of SAC and ERSAC with BER being 10e-3 (left) and lOe-4 (right), Tbc audio clip is horn23_2 
and me bandwidm is 64 kbps. 

Table 2. Average NMR results for multiple audio eh'ps with different BERs 



Audio 
Clips 


Horn23_2 


Trpt21_2 


Vioo 


10_2 


BER»10e-3 


BER-10e-4 


BER-10e-3 


BER= 10e-4 


BER«10e~3 


BER - 1 0e-4 


SAC 


4.5713 


2.6185 


4.4139 


2.7738 


4.4192 


3.0124 


ERSAC 


4.2045 


1 2.1698 


4.1738 


2.1050 


4.0264 


2.1757 



These results show that ERSAC indeed improves the error resilience of scalable audio coding and is more immune to 
bit errors than SAC. Subjective teste are also conducted. Listeners perceive a better quality of the delivered audio given 
by ERSAC, while annoying artifacts are audible from SAC 
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